UNBIASED CONDITIONAL INFERENCE FOREST (cforest)


1 Introduction


The following tutorial is part of the seminar “Species Distribution Modeling”. The aim of this tutorial is to introduce you to the cforest method by modeling a species distribution of butterfly species in Pakistan and creating a species richness map from it.



Figure 1: Aglais Cashmirensis




2 What is cforest ?


Cforest, like Random Forest, is a statistical method for performing regression analyses. It uses an algorithm to create probability trees based on several independent variables (HOTHORN et al. 2006, p. 1). In contrast to linear regressions, random forest and cforest have the advantage that they can process large amounts of data with many complex variables in a relatively short time (STROBEL 2009a, p. 4). This fits well for our project to predict potential butterfly occurrences with hundreds of butterfly sites and a lot of biovariates like annual rainfall or temperature data.




2.1 What is a decision tree?


As you see below, a decision tree is very easy to visualize. It consists of different internal nodes that divide sampled data into two categories by an independent variable. Through this procedure, the impurity of the data will be reduced. At the end of each branch are the leaf nodes which contain the splitted data. If one leaf node contains just one category of the data this leaf node is pure and the impurity value is zero (STROBEL et al. 2009a, p. 9). If one leaf node has more than one category, the impurity rate is higher (ibid.). However, the category with the most values will win the decision when the data of the leaf node is aggregated (BREIMAN 2001, p. 6). Below you can see a little example of a decision tree with the important vocabulary’s internal node, branch, variable and leaf node.



Figure 2: Example of a Decision Tree




3 From Random Forest to cforest


Cforest is a further development of Random Forest and works only slightly different. To understand Cforest, it is therefore useful to explain Random Forest first.




3.1 How does Random Forest work


Random Forest is characterized by the random selection of samples and variables (STROBEL et al. 2009a, p. 16). From a sample (for example our butterfly data) subsets are drawn at random, from which decision trees are then formed with randomly selected variables. At the end of the decision trees, i.e. in the leaf nodes, the individual observations can be found in classes. Thereby, the purity of the data summarized in classes was significantly increased by splitting in the nodes before. At the end, the observations of the classes are aggregated to one result. The result, which occurs in the most classes, wins (BREIMAN 2001, S. 1). The repeated dragging and dropping of the subsets from the data is called bootstrapping. Together with aggregating the data at the end of the decision trees, this process is also called bagging (HOTHORN et al. 2004, p. 78). Such a procedure results in good stability of the model and is better than just one decision tree (ibid.). This is because within individual decision trees there will certainly be individual errors. But if thousand such random decision trees are formed several times and the results are aggregated together, then the prediction will be very good on average. If you want do dive deeper in the Theory of Random Forests, we can recommend you the student tutorial from Mandy Gimpel which you can find here: Using Random Forest in R for Butterfly Fauna Distribution in Pakistan (geomoer.github.io).




3.2 Variable selection Bias in Random Forest


After explaining Random Forest to you, we can move on to the further development of this model: to cforest. One problem with Random Forest’s decision trees is that they estimate certain variables to be more important than they actually are (STROBEL et al. 2009b, p. 1). These are especially variables with many categories or continuous data but also variables which have many missing measurement data (STROBEL et al 2009a, p. 30). This creates a bias that affects the importance of the variables and impacts the validity of the model (ibid.). Such a bias arises especially when many heterogeneous variables with different numbers of categories are used to create the decision trees (HOTHORN et al. 2006, p. 2). The cause of the incorrect estimation are calculations of the Gini importance or the permutation importance measurement of a variable. These two tests show how well a variable can clean up the data in a node by splitting the data (STROBEL et al. 2009a, p. 8). To put it in a nutshell, when the variable importance measure is biased towards variables with a lot of categories, these variables will be preferred to build the decision tree so the output of the model will be biased in an incorrect direction (STROBEL et al. 2007, p. 2). Since we use data with a high range of continuous data, i.e. the annual precipitation in the variable bio 12 (WORLDCLIM 2022) and data with fewer categories, i.e. the min temperature of the coldest month in the variable bio 06 in predicting butterfly occurrence, it makes sense to avoid such bias in the importance of variables with cforest. One more note: there are even more studies that are trying to solve the problem of biased variable selection using a corrected impurity measure (AIR) (NEMBRINI et al. 2018, p. 3711f.).




3.3 Solving the variable selection bias problem with cforest


But how does cforest manage to circumvent this bias within the independent variables? This is done by measuring the significance of each variable selected for a node in the decision tree. To be absolutely clear, cforest creates random forests but the Gini importance and the permutation importance measurement for the selection of variables in the trees are taken out of the race (STROBEL et al. 2007, p. 3). Instead of this, the variable that has the highest significance, i.e. explains the largest part of the independent variable, is used as the first splitting variable within a node in the decision tree (HOTHORN et al., 2006, p. 2). All other variables that are also significant and thus explain a certain proportion of our dependent variable can also be used in the decision tree. However, if the significance test of a variable turns out to be negative, i.e. a p-test for example above 0.05%, then this tree is terminated (ibid., p. 4). This is to circumvent the bias of variable weighting, to achieve a better predictive accuracy of the model and to be able to analyse data with heterogeneous variables. We will call the trees that are created with the cforest function in R conditional inference trees (STROBEL et al. 2007, p. 5).




3.4 Even more bias problems with variables that have many categories?


Unfortunately, there is also a small problem with the selection of variables in cforest. If bootstrapping, i.e. the repeated dragging and replacement from a sample, is performed with heterogeneous variables, then variables with many categories are also preferred in cforest (STROBEL et al. 2007, p. 17). This problem can be solved if the drawing of the variables is done without replacement. Then, no variables with many categories are used more often than other variables for creating the decision trees (ibid.).




3.5 Overfitting as a second problem of decision trees


Another problem with using decision trees is overfitting (HOTHORN et al. 2006, p. 2). Overfitting occurs when the model learns the structures inherent in the learning sample (STROBEL et al 2009a, p. 9). However, these structures are not found in reality, but are only created by the arbitrary selection of the sample. To prevent this adaptation of the model to the learning data, the length of the decision trees is limited. They can simply be pruned from a certain number of nodes or they can be interrupted by a statistical criterion. In cforest, for example, the continuation of the decision tree can be interrupted as soon as a selected significance level is not reached (HOTHORN et al. 2006, p. 7). This also addresses the question of how long decision trees should grow. STROBEL et al. also points out that it can be useful to limit the length of the decision trees to avoid overfitting (STROBEL et al. 2009a, p. 10). Especially with a large number of data, the predictive accuracy of the model seems to increase when using many but small trees. It is different for small samples. In this case, the use of large trees makes sense (ibid., p. 33). Since we often have a small sample of butterfly species, we should apply this advice and use large decision trees.




4 A short overview over the bioclim data


For our species distribution model, we will use the free available Bioclimatic variables from WorldClim for Pakistan. They contain 19 different variables which are generated from monthly temperature and rainfall data (WorldClim 2022). When you have a closer look, you can see that they measure annual trends, for example the mean annual temperature, seasonality, represented in annual data and extreme environmental factors (ibid.). An example for an extreme factor is the precipitation of the driest quarter and could be a limiting environmental factor for the occurrence of our butterflies in Pakistan. Below you can see all 19 variables which are part of the WorldClim biodata variables:

BIO1 = Annual Mean Temperature

BIO2 = Mean Diurnal Range (Mean of monthly (max temp - min temp))

BIO3 = Isothermality (BIO2/BIO7) (×100)

BIO4 = Temperature Seasonality (standard deviation ×100)

BIO5 = Max Temperature of Warmest Month

BIO6 = Min Temperature of Coldest Month

BIO7 = Temperature Annual Range (BIO5-BIO6)

BIO8 = Mean Temperature of Wettest Quarter

BIO9 = Mean Temperature of Driest Quarter

BIO10 = Mean Temperature of Warmest Quarter

BIO11 = Mean Temperature of Coldest Quarter

BIO12 = Annual Precipitation

BIO13 = Precipitation of Wettest Month

BIO14 = Precipitation of Driest Month

BIO15 = Precipitation Seasonality (Coefficient of Variation)

BIO16 = Precipitation of Wettest Quarter

BIO17 = Precipitation of Driest Quarter

BIO18 = Precipitation of Warmest Quarter

BIO19 = Precipitation of Coldest Quarter




5 Modeling Workflow




5.1 Loading the packages we need and setting the working directory


used packages

library(caret)
library(sf)
library(geodata)
library(dismo)
library(dplyr)
library(mapview)
library(partykit)
library(party)
library(moreparty)
library(terra)
library(Metrics)
library(ecospat)
library(raster)

working directory

wd = "C:\\UNI\\SoSe_23\\SDM\\SDMworkflow"
setwd(wd)




5.2 Load in our data


We start by reading the species file “PakistanLadakh.csv” provided to us in the course. It is in CSV format and contains individual butterfly species presence points with xy coordinates located in Pakistan.

data <- read.csv(file.path(wd, "data", "PakistanLadakh.csv"))

And change the column names

colnames(data) <- c("species","x","y")

We can count the number of occurrences per species to see a bit of the structure.

spec_names <- unique(data$species)
cat("The Dataset contains", format(length(spec_names)), "different species 
with overall", format(length(data$species)), "presence entries, 
and an average of", format(round(length(data$species)/length(spec_names),0)), "presence entries per species.")
## The Dataset contains 429 different species 
## with overall 5870 presence entries, 
## and an average of 14 presence entries per species.

After that we load the environmental variables for Pakistan. We just used the bioclimate ‘bio’ data provided by WorldClim climate data in the geodata package. You can also use a lot more variables as predictors as you can see on the github page “rspatial / geodata”.

# get the environmental layers:
subfolder_path <- file.path(wd, "data")
r <- geodata::worldclim_country(country="PAK", res=10, var="bio", path=subfolder_path, version = "2.1")
names(r) <- substr(names(r), 11, 20)

# mask to border of Pakistan
r <- terra::mask(r, geodata::gadm(country="Pakistan", path=subfolder_path))

# save the raster for later use
terra::writeRaster(r, file.path(wd, "data", "bioclim_01.tif"), overwrite=T)

We can check and plot one of the environmental variables.

plot(r$bio_9, main="Bio_9 - Mean Temperature of Driest Quarter")




5.3 Data preparation


For testing the model we selecting just one single species from the data. In this case Aglais Caschmirensis.

Aglais_caschmirensis <- dplyr::filter(data, species=="Aglais_caschmirensis")

# We are transforming the dataframe into an sf object
Aglais_caschmirensis <- sf::st_as_sf(Aglais_caschmirensis, coords=c("x", "y"), remove=F, crs=sf::st_crs("epsg:4326"))

Next we create some background absence points, for this example 1000.

bg <- sf::st_as_sf(as.data.frame(predicts::backgroundSample(mask=r, n=1000)), crs=terra::crs(r), coords=c("x","y"), remove=F)

# extract environmental information for background data from SpatRaster r
bg_extr <- terra::extract(r, bg)
bg <- cbind(bg,bg_extr);rm(bg_extr)

#and binding the extracted data to one dataframe
Aglais_caschmirensis_extr <- terra::extract(r,Aglais_caschmirensis)
Aglais_caschmirensis <- cbind(Aglais_caschmirensis, Aglais_caschmirensis_extr);rm(Aglais_caschmirensis_extr)

We can plot the data to look were the butterflies are in presence. Therefore we use the mapview package.

r_rast <- raster::raster(r)
mapview(r_rast, layer.name = "Aglais Caschmirensis", col.regions = RColorBrewer::brewer.pal(5, "RdYlBu"), map.types = "Esri.WorldImagery") + Aglais_caschmirensis

And now we are splitting and saving the data of the selected butterfly species into a training and testing set by using the dplyr package. We decided for 85 percent training an 15 percent testing.

`%not_in%`<- Negate(`%in%`)

testData=dplyr::slice_sample(Aglais_caschmirensis, prop=.15)
trainData=dplyr::filter(Aglais_caschmirensis, ID %not_in% testData$ID )

sf::write_sf(testData, file.path(wd, "data", "Aglais_caschmirensis_testData_01.gpkg"))
sf::write_sf(trainData, file.path(wd, "data","Aglais_caschmirensis_trainData_01.gpkg"))
sf::write_sf(bg, file.path(wd, "data","background_01.gpkg"))

Now we loading the saved training and background data to make a dataframe with presence and absence points for the modeling.

# read the presence only data
po <- sf::read_sf(file.path(wd, "data", "Aglais_caschmirensis_trainData_01.gpkg"))
po$occ=1
# read the absence only data
bg <- sf::read_sf(file.path(wd, "data","background_01.gpkg"))
bg$occ=0
# we can delete the species column because we have just one species
po <- po %>% select(-species)
#str(po) you can have a look at the structure

We bind the presence and absence data to one dataframe and delete the geom column.

data <- rbind(po,bg)%>%as.data.frame()%>%dplyr::select(-geom)




5.4 Let’s run this thing


In the next task we build the model with the caret package and cforest as the method. We selected for the controls cforest_unbiased with an mtry of 3 and ntree of 50, the default is 5 and 500, but this needs a lot of time and computer resources.After train the model we can use the varImp function to have a look at the importance of the variables.

set.seed(2023) #set seed for reproducibility

model1 <- caret::train(occ ~ bio_1 + bio_2 + bio_3 + bio_4 + bio_5 + bio_6 + bio_7 + bio_8 + bio_9 + bio_10 + bio_11 + bio_12 + bio_13 + bio_14 + bio_15 + bio_16 + bio_17 + bio_18 + bio_19,
                       data = data,
                       method = "cforest", 
                       controls = cforest_unbiased(mtry = 3, ntree = 50),
                       na.action = na.exclude)
varImp(model1)
## cforest variable importance
## 
##        Overall
## bio_14 100.000
## bio_17  87.962
## bio_1   59.904
## bio_9   40.853
## bio_2   35.194
## bio_12  30.021
## bio_5   26.928
## bio_8   24.336
## bio_19  23.050
## bio_3   17.300
## bio_7   12.454
## bio_6   11.234
## bio_11   9.743
## bio_10   9.714
## bio_13   8.314
## bio_18   8.295
## bio_16   7.574
## bio_15   2.230
## bio_4    0.000

Now it’s time for predicting and saving the results and read them again in Raster format to plot the result with mapview.

pred_model1 <- terra::predict(object = r, model = model1, OBB=TRUE, na.rm = T, cores=2, cpkgs="party")
plot(pred_model1)

terra::writeRaster(pred_model1, file.path(wd, "data", "prediction_aglais_caschmirensis_01.tif"), overwrite=TRUE)
pred_model1 <- terra::rast(file.path(wd, "data","prediction_aglais_caschmirensis_01.tif"))
rast_01 <- raster::raster(file.path(wd, "data","prediction_aglais_caschmirensis_01.tif"))
mapview(rast_01, zcol = "Prediction of Aglais Caschmirensis", col.regions = RColorBrewer::brewer.pal(5, "RdYlBu"), map.types = "Esri.WorldImagery") + po




5.5 Results


For evaluation we use the Boyce Index from the ecospat Package and the AUC

# load your testdata

testData <- sf::read_sf(file.path(wd, "data", "Aglais_caschmirensis_testData_01.gpkg"))

# calculate the boyce-index:
boyceIndex <- ecospat::ecospat.boyce(fit=rast_01, obs=testData)

boyceIndex <- boyceIndex$cor

#extract the values of the testdata from the prediction raster for AUC and MAE
extrTest <- terra::extract(pred_model1, testData)
colnames(extrTest) <- c( "ID"  , "predcicted")
extrTest$observed <- 1

# load background data and extract
bg <- sf::read_sf("background_01.gpkg")
extrBg <- terra::extract(pred_model1, bg)
colnames(extrBg) <- c( "ID"  , "predcicted")
extrBg$observed <- 0

# bind both dataframes together:
testData_boyce <- rbind(extrTest, extrBg)
#rm(extrTest, extrBg,bg)

# calculate AUC and MAE
AUC <- Metrics::auc(actual = testData_boyce$observed, predicted = testData_boyce$predcicted)

print(AUC)
## [1] 0.3232857
print(boyceIndex)
## [1] 0.592


6 Final model and Species richness map

After testing the model on one singel species we can go for a larger prediction using more different species and a loop for modeling

We load all the needed data again and use a filter for the amount of different species, for demonstrating we use a threshold of 65 presence point per species.

data <- read.csv(file.path(wd, "data", "PakistanLadakh.csv"))
colnames(data) <- c("species","x","y")

Filter the data

species_butterflies <- table(data$species)

x <- 65  # Threshold can vary
filtered_species <- names(species_butterflies[species_butterflies >= x | species_butterflies == 0])

data <- data[data$species %in% filtered_species, ]
# We are transforming the dataframe again into an sf object
data <- sf::st_as_sf(data, coords=c("x", "y"), remove=F, crs=sf::st_crs("epsg:4326"))

Loading the environmental data

# get the environmental layers:
subfolder_path <- file.path(wd, "data")
r <- geodata::worldclim_country(country="PAK", res=10, var="bio", path=subfolder_path, version = "2.1")
names(r) <- substr(names(r), 11, 20)

# mask to border of Pakistan
r <- terra::mask(r, geodata::gadm(country="Pakistan", path=subfolder_path))

# save the raster for later use
terra::writeRaster(r, file.path(wd, "data", "bioclim_subset_01.tif"), overwrite=T)

Creating background points, for this task we choosed 5000 random absence points.

# create background points
bg <- sf::st_as_sf(as.data.frame(predicts::backgroundSample(mask=r, n=5000)), crs=terra::crs(r), coords=c("x","y"), remove=F)

# extract environmental information for background data
bg_extr <- terra::extract(r, bg)
bg <- cbind(bg, bg_extr); rm(bg_extr)

data_extr <- terra::extract(r, data)
data <- cbind(data, data_extr);rm(data_extr)

Again splitting in training and testing data

`%not_in%`<- Negate(`%in%`)

testData <- dplyr::slice_sample(data, prop=.15)
trainData <- dplyr::filter(data, ID %not_in% testData$ID )

sf::write_sf(testData, "testData_subset_01.gpkg")
sf::write_sf(trainData, "trainData_subset_01.gpkg")
sf::write_sf(bg, "background_subset_01.gpkg")

Reading presence and absence data and creating a new column called ‘species’ in the background dataframe bg and fill all the rows wit ‘bg’ for backgroundpoints

# read the presence only data
po <- sf::read_sf("trainData_subset_01.gpkg")
po$occ=1
# read the absence only data
bg <- sf::read_sf("background_subset_01.gpkg")
bg$occ=0
bg$species <- as.character(bg$ID)
bg$species <- rep("bg", nrow(bg))
# create again one dataset with presence and background data
data <- rbind(po,bg)%>%as.data.frame()%>%dplyr::select(-geom)

Creating some empty lists and preparations for modeling in a loop

# we don't use the background points as a single species for the model, so we can skip them.
bg <- "bg"

unique_species <- unique(subset(data, species != bg)$species)
models <- list()
predictions <- list()

The loop for modeling

set.seed(2023)

for (i in unique_species) {
  # filter actual species
  data_species <- subset(data, species %in% c(i, "bg"))
  
  # cforest-model for actual species
  model <- caret::train(occ ~ bio_1 + bio_2 + bio_3 + bio_4 + bio_5 + bio_6 + bio_7 + bio_8 + bio_9 + bio_10 + bio_11 + bio_12 + bio_13 + bio_14 + bio_15 + bio_16 + bio_17 + bio_18 + bio_19,
                        data = data_species,
                        method = "cforest", 
                        controls = cforest_unbiased(mtry = 3, ntree = 50),
                        na.action = na.exclude)
  
  # model to the list
  models[[i]] <- model
  
  # predictions
  predict <- terra::predict(object = r, model = model, OBB=TRUE, na.rm = T, cores=2, cpkgs="party") 
  predictions[[i]] <- predict
  
}

For finding a threshold for the prediction we can use a function made by Cecina Babich Morrow, for more information you can click on the following link: Thresholding species distribution models

# function of Cecina Babich Morrow
sdm_threshold <- function(sdm, occs, type = "mtp", binary = FALSE){
  occPredVals <- raster::extract(sdm, occs)
  if(type == "mtp"){
    thresh <- min(na.omit(occPredVals))
  } else if(type == "p10"){
    if(length(occPredVals) < 10){
      p10 <- floor(length(occPredVals) * 0.9)
    } else {
      p10 <- ceiling(length(occPredVals) * 0.9)
    }
    thresh <- rev(sort(occPredVals))[p10]
  }
  sdm_thresh <- sdm
  sdm_thresh[sdm_thresh < thresh] <- NA
  if(binary){
    sdm_thresh[sdm_thresh >= thresh] <- 1
  }
  return(sdm_thresh)
}

Making a loop function for the threshold and a raster stack for a species richness map

stack_sdm <- rast(predictions[1])


for (i in 2:length(predictions)) {
  raster <- rast(predictions[i])
  stack_sdm <- merge(stack_sdm, raster)
}

# saving the raster stack
terra::writeRaster(stack_sdm, file.path(wd, "data", "species_richness_sdm_01.tif"), overwrite=T)

Plotting the species richness map

rast_02 <- raster::raster(file.path(wd, "data","species_richness_sdm_01.tif"))
mapview(rast_02, zcol = "Species richness map", col.regions = RColorBrewer::brewer.pal(5, "RdYlBu"), map.types = "Esri.WorldImagery", legend = TRUE, layer.name = "Species richness map") + po

For Evaluation we use the Boyce Index and the AUC again.

stack_sdm <- terra::rast(file.path(wd, "data","species_richness_sdm_01.tif"))

# calculate the boyce-index:
boyceIndex <- ecospat::ecospat.boyce(fit=raster(stack_sdm), obs=testData, method = "spearman")

boyceIndex <- boyceIndex$cor

# extract the values of the testdata from the prediction raster for AUC and MAE
extrTest <- terra::extract(stack_sdm, testData)
colnames(extrTest) <-c( "ID"  , "predcicted")
extrTest$observed <- 1

# load background data and extract
bg <- sf::read_sf("background_subset_01.gpkg")
extrBg <- terra::extract(stack_sdm, bg)
colnames(extrBg) <- c( "ID"  , "predcicted")
extrBg$observed <- 0

# bind both dataframes together:
testData_02 <- rbind(extrTest, extrBg)
#rm(extrTest, extrBg,bg)

# calculate AUC and MAE
AUC <- Metrics::auc(actual = testData_02$observed, predicted = testData_02$predcicted)

print(AUC)
## [1] 0.3181433
print(boyceIndex)
## [1] 0.957




7 Sources


7.1 Bibliography:


BREIMAN, L. (2001): Random Forests. Machine Learning. 45: 5–32.

HOTHORN, T., HORNIK, K. & A. ZEILEIS (2006): Unbiased Recursive Partitioning: A Conditional Inference
Framework. Journal of Computational and Graphical Statistics 15/3: 651–674.

HOTHORN, T., LAUSEN, B., BENNER, A. & M. RADESPIEL-TRÖGER (2004): Bagging survival trees. Statistics in Medicine 23: 77–91.

NEMBRINI, S., KÖNIG, I. R. & M. N. WRIGHT (2018): The revival of the Gini importance? Bioinformatics 34: 3711–3718.

STROBEL, C., MALLEY, J. & G. TUTZ (2009a): An Introduction to Recursive Partition. Munich.

STROBEL, C., HOTHORN, T. & A. ZEILEIS (2009b): Party on! A New, Conditional Variable Importance Measure for Random Forests Available in the party Package. Munich.

STROBEL, C., BOULESTEIX, A.-C., ZEILEIS, A. & T. HOTHORN (2007): Bias in random forest variable importance measures: Illustrations, sources and a solution. Bioinformatics 8/25: o. A. WORLDCLIM (2022): Bioclimatic variables. https://worldclim.org/data/bioclim.html (Zugriff: 03.06.2023).




7.2 Images


Figure 1: Aglais Cashmirensis (Zugriff: 07.06.2023)

Figure 2: Own representation

8 Session info

sessionInfo()
## R version 4.2.2 (2022-10-31 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 22621)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=German_Germany.utf8  LC_CTYPE=German_Germany.utf8   
## [3] LC_MONETARY=German_Germany.utf8 LC_NUMERIC=C                   
## [5] LC_TIME=German_Germany.utf8    
## 
## attached base packages:
## [1] stats4    grid      stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] ecospat_3.5       Metrics_0.1.4     moreparty_0.3.1   party_1.3-13     
##  [5] strucchange_1.5-3 sandwich_3.1-0    zoo_1.8-12        modeltools_0.2-23
##  [9] partykit_1.2-20   mvtnorm_1.2-0     libcoin_1.0-9     mapview_2.11.0   
## [13] dplyr_1.1.2       dismo_1.3-14      raster_3.6-20     sp_1.6-1         
## [17] geodata_0.5-8     terra_1.7-29      sf_1.0-13         caret_6.0-94     
## [21] lattice_0.21-8    ggplot2_3.4.2     knitr_1.43       
## 
## loaded via a namespace (and not attached):
##   [1] uuid_1.1-0              backports_1.4.1         Hmisc_5.1-0            
##   [4] systemfonts_1.0.4       plyr_1.8.8              rclipboard_0.1.6       
##   [7] splines_4.2.2           crosstalk_1.2.0         listenv_0.9.0          
##  [10] leaflet_2.1.2           TH.data_1.1-2           leafpop_0.1.0          
##  [13] digest_0.6.31           foreach_1.5.2           htmltools_0.5.5        
##  [16] earth_5.3.2             leaflet.providers_1.9.0 fansi_1.0.4            
##  [19] checkmate_2.2.0         magrittr_2.0.3          cluster_2.1.4          
##  [22] ks_1.14.0               recipes_1.0.6           globals_0.16.2         
##  [25] gower_1.0.1             matrixStats_1.0.0       predicts_0.1-8         
##  [28] svglite_2.1.1           hardhat_1.3.0           timechange_0.2.0       
##  [31] colorspace_2.1-0        rgdal_1.6-7             xfun_0.39              
##  [34] leafem_0.2.0            jsonlite_1.8.4          brew_1.0-8             
##  [37] survival_3.5-5          iterators_1.0.14        ape_5.7-1              
##  [40] glue_1.6.2              PresenceAbsence_1.1.11  gtable_0.3.3           
##  [43] ipred_0.9-14            webshot_0.5.4           phosphoricons_0.2.0    
##  [46] future.apply_1.11.0     abind_1.4-5             scales_1.2.1           
##  [49] DBI_1.1.3               Rcpp_1.0.10             plotrix_3.8-2          
##  [52] htmlTable_2.4.1         xtable_1.8-4            units_0.8-2            
##  [55] foreign_0.8-84          mclust_6.0.0            proxy_0.4-27           
##  [58] Formula_1.2-5           lava_1.7.2.1            prodlim_2023.03.31     
##  [61] DT_0.28                 htmlwidgets_1.6.2       RColorBrewer_1.1-3     
##  [64] nabor_0.5.0             ellipsis_0.3.2          farver_2.1.1           
##  [67] reshape_0.8.9           pkgconfig_2.0.3         nnet_7.3-19            
##  [70] sass_0.4.6              utf8_1.2.3              tidyselect_1.2.0       
##  [73] rlang_1.1.1             reshape2_1.4.4          later_1.3.1            
##  [76] munsell_0.5.0           TeachingDemos_2.12      tools_4.2.2            
##  [79] cachem_1.0.8            cli_3.6.1               generics_0.1.3         
##  [82] ade4_1.7-22             evaluate_0.21           stringr_1.5.0          
##  [85] fastmap_1.1.1           yaml_2.3.7              maxnet_0.1.4           
##  [88] ModelMetrics_1.2.2.2    randomForest_4.7-1.1    purrr_1.0.1            
##  [91] satellite_1.0.4         coin_1.4-2              future_1.32.0          
##  [94] nlme_3.1-162            mime_0.12               pracma_2.4.2           
##  [97] biomod2_4.2-3           compiler_4.2.2          rstudioapi_0.14        
## [100] png_0.1-8               e1071_1.7-13            tibble_3.2.1           
## [103] bslib_0.4.2             stringi_1.7.12          highr_0.10             
## [106] plotmo_3.6.2            poibin_1.5              Matrix_1.5-4.1         
## [109] classInt_0.4-9          permute_0.9-7           vegan_2.6-4            
## [112] gbm_2.1.8.1             vctrs_0.6.2             pillar_1.9.0           
## [115] lifecycle_1.0.3         jquerylib_0.1.4         data.table_1.14.8      
## [118] httpuv_1.6.11           R6_2.5.1                promises_1.2.0.1       
## [121] gridExtra_2.3           KernSmooth_2.23-21      parallelly_1.36.0      
## [124] codetools_0.2-19        measures_0.3            gtools_3.9.4           
## [127] MASS_7.3-60             shinyWidgets_0.7.6      withr_2.5.0            
## [130] multcomp_1.4-23         varImp_0.4              mgcv_1.8-42            
## [133] parallel_4.2.2          rpart_4.1.19            timeDate_4022.108      
## [136] class_7.3-22            rmarkdown_2.22          inum_1.0-5             
## [139] mda_0.5-3               pROC_1.18.2             shiny_1.7.4            
## [142] lubridate_1.9.2         base64enc_0.1-3